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Abstract 

A  new  methodology  is  given  in  this  paper  to  obtain  a  near-optimal  strategy  (i.e.,  specification  of  courses 
of  action  over  time),  which  is  also  robust  to  environmental  perturbations  (unexpected  events  and/or 
parameter  uncertainties),  to  achieve  the  desired  effects.  A  dynamic  Bayesian  network  (DBN)-based 
stochastic  mission  model  is  employed  to  represent  the  dynamic  and  uncertain  nature  of  the  environment. 
Genetic  algorithms  are  applied  to  search  for  a  near-optimal  strategy  with  DBN  serving  as  a  fitness 
evaluator.  The  probability  of  achieving  the  desired  effects  (namely,  the  probability  of  success)  at  a 
specified  terminal  time  is  a  random  variable  due  to  uncertainties  in  the  environment.  Consequently,  we 
focus  on  signal-to-noise  ratio  (SNR),  a  measure  of  mean  and  variance  of  the  probability  of  success,  to 
gauge  the  goodness  of  a  strategy.  The  resulting  strategy  will  not  only  have  a  relatively  high  probability 
of  inducing  the  desired  effects,  but  also  be  robust  to  environmental  uncertainties. 

Keywords:  Effects-based  operations,  optimization,  organizational  design,  robustness,  signal-to-noise 
ratio,  Taguchi  method,  dynamic  Bayesian  networks,  genetic  algorithms,  confidence  region,  hypothesis 
testing 

1.  Introduction 

Robustness  is  a  key  issue  in  stochastic  planning  problems  under  uncertainty.  This  paper  describes  the 
application  of  dynamic  Bayesian  networks,  along  with  evolutionary  optimization  through  genetic 
algorithms,  to  derive  robust  strategies  that  induce  the  desired  effects  in  a  mission  environment.  The 
methodology  discussed  here  is  applicable  to  both  military  organizations  and  commercial  enterprises. 

An  organization’s  ability  to  choose  an  efficient  and  effective  strategy  for  its  mission  execution  is 
critical  to  its  superior  performance.  Given  the  dynamic  nature  of  a  modern  military  environment,  an 
effective  C2  strategy  is  to  create  the  desired  effects  at  the  right  place  and  at  the  right  time.  Actions 
constitute  the  means  by  which  an  organization  attempts  to  shape  the  future.  However,  environmental 
conditions  also  affect  the  feasibility  of  organization’s  actions,  making  some  strategies  more  likely  to 
succeed  than  others.  The  uncertainty  about  the  dynamics  of  potential  interactions  between  organization’s 
actions  and  its  environment  could  result  from  two  different  sources:  (i)  the  inability  to  predict  some  of 
the  indirect  cross-influence  effects  of  organization’s  actions  [Leblebici81],  and  (ii)  the  stochastic  nature 
of  the  dynamic  environment  faced  by  the  organization  [Emery65].  Consequently,  the  extent  of  a 
potential  organization’s  control  over  the  effects  it  desires  to  achieve  is  limited  and,  in  some  cases, 
indirect.  The  corresponding  models  must  capture  and  quantify  the  influence  of  organization’s  actions, 
various  stochastic  events,  and  direct  or  latent  effects. 

In  most  cases,  there  are  a  large  number  of  cause-effect  relationships  within  an  environment,  many 
of  which  are  not  observable  by  the  organization.  Probabilistic  models,  such  as  dynamic  Bayesian 
networks,  are  natural  candidates  for  representing  uncertainties  in  a  dynamic  environment.  A  robust 
strategy  seeks  to  maximize  the  probability  of  successfully  achieving  the  desired  effects,  while 
minimizing  its  variability. 

This  paper  introduces  a  framework  for  devising  a  robust  organizational  strategy  to  induce  desired 
effects  in  a  dynamic  and  uncertain  mission  environment.  A  normative  model  of  the  stochastic 
environment,  based  on  a  dynamic  Bayesian  network  (DBN),  to  infer  indirect  influences  and  to  track  the 
time  propagation  of  effects  in  complex  systems  is  developed.  For  a  specified  set  of  mission  goals  (i.e., 
desired  effects)  and  organizational  constraints,  intermediate  organizational  objectives  are  derived,  and  a 
near-optimal  organizational  strategy  is  obtained  via  genetic  algorithms,  where  the  DBN  serves  as  a 
fitness  evaluator  for  candidate  strategies.  The  results  of  this  paper  will  form  a  foundation  for  current 
research  on  dynamic  adaptation  of  organizational  strategies. 


The  remainder  of  the  paper  is  organized  as  follows.  In  section  2,  we  will  formalize  the  problem  as  a 
graph  model,  which  we  call  an  effects-based  mission  model.  This  model  represents  the  concepts  from 
effects-based  operations  (EBO)  (see  [McCrabbOl],  [DavisOl])  in  the  form  of  a  Bayesian  network. 
Section  3  describes  our  approach  for  this  problem,  which  combines  DBN  with  genetic  algorithms  to 
compute  robust  action  strategies.  Signal  to  noise  ratio  (SNR),  computed  from  Monte  Carlo  runs,  is  used 
as  a  criterion  of  robustness.  In  section  4,  two  conceptual  examples,  one  commercial  and  the  other 
military,  are  solved  to  demonstrate  the  feasibility  of  the  methodology.  Finally,  we  conclude  with  a 
summary  of  current  research  and  future  research  directions. 

2.  Model  and  Formulation  for  Strategy  Optimization 

A  stochastic  planning  problem  in  an  uncertain  environment  can  be  defined  as  follows:  given  an  initial 
environment  state,  determine  optimal  action  sequences  that  will  bring  the  environment  to  a  specified 
destination  (goal)  state  at  a  specified  time  with  a  relatively  high  probability.  The  destination,  in  our 
case,  is  the  set  of  desired  effects. 

The  process  to  solve  this  problem  is  to: 

(i)  Represent  the  joint  dynamics  of  the  organization  and  its  environment; 

(ii)  Optimally  select  appropriate  courses  of  action; 

(iii)  Assess  the  probability  of  successfully  achieving  the  desired  effects  and  the  corresponding  risks. 
As  illustrated  in  Fig.l,  a  dynamically  evolving  effects-based  mission  model  Gk  =  G(tk )  =  (V,  E,  Pk ) , 

which  can  be  viewed  as  a  Bayesian  network  at  time  4,  combines  knowledge  about  the  organization  and 
its  environment.  Gk  is  a  directed  acyclic  graph  consisting  of  a  set  of  nodes  V  and  a  set  of  directed  edges 

E  with  a  fixed  structure.  Every  node  is  considered  as  a  random  variable  and  can  assume  Boolean 
values  —  either  true  (T’)  or  false  (‘0’).  For  each  node  v,  e  V ,  we  define  a  probability  mass  function 
(pmf)  Pk(vi)  -  P{\’j  (tk )}  to  characterize  the  environment  uncertainty  at  time  tk  . 


A  —  Actions 
B  —  Exogenous  Events 
C  —  Intermediate  Effects 
D  —  Desired  Effects 


Fig.l:  A  Simple  Effects-Based  Mission  Model 

The  dynamic  evolution  of  the  effects-based  mission  model  unfolds  through  a  finite  horizon 
timeline,  which  is  discretized  into  T  time  slices  (from  t,  to  tT).  Time  slices  are  used  to  represent  a 
snapshot  of  the  evolving  temporal  process  [Kanazawa95].  This  evolution  can  be  depicted  as  a  DBN  as 
shown  in  Fig. 2  ( t0  is  the  time  before  the  first  time  slice).  The  nodes  in  this  network  have  causal- 
temporal  relationships  with  each  other.  The  solid  arcs  are  “synchronic”  to  portray  the  causal 
relationship  in  a  single  time  slice,  and  the  dashed  edges  are  “diachronic”  to  show  the  temporal  evolution 
of  the  model  between  neighboring  time  slices  [Boutilier98].  With  these  assumptions,  the  DBN  is 
Markovian. 
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Fig.  2:  Time-evolution  of  Effects-Based  Mission  Model  as  a  DBN 


Based  on  Fig. 2,  the  key  elements  of  this  model  are  as  follows: 

(i)  Critical  objects  (including  centers  of  gravity,  or  COG  [McCrabbOl])  that  constitute  the 

environment  of  interest,  with  X^,)  =  {PA  (v;)  I  v;.  e  k}to  portray  the  overall  state  of  the 

environment  at  time  tk ; 

(ii)  Objectives  (to  be  framed  in  terms  of  desired  effects)  D  =  {Dn  1 1  <  n  <  ND }  specified  by  the 

desired  outcomes  (T’  or  ‘0’)  and  the  corresponding  terminal  time  tD  for  each  effect: 

Dn  (tD  )  =  1  or  Dn  (tD  )  =  0 .  Here  ND  =  \d\  is  the  total  number  of  effects  we  are  interested  in; 

(iii)  Critical/important  exogenous  events,  regarded  as  noise  factors,  whose  occurrence  is  beyond  the 
control  of  the  organization,  but  will  affect  the  environmental  dynamics: B  =  [B .  ll<  j<NB}. 

Nb  =  \B\  is  the  total  number  of  exogenous  events  in  the  environment.  In  many  cases,  one  has 
partial  knowledge  of  the  statistical  properties  (e.g.,  means  and  variances,  probability 
distributions)  for  these  events.  For  instance,  if  event  By  in  Fig.l  occurs  with  a  probability  that 
is  uniform  between  [0.2, 0.6]  at  time  tk ,  then  Pk{Bl  =  1}  =  p1,Pk{Bl  =  0}  =  1-  px,  where 
Pi  ~  U[ 0.2,0.6]  •  The  prior  pmfs  in  the  model  are  application  specific  and  are  normally  elicited 
from  domain  experts.  We  may  also  consider  the  enemy  actions  (or  competitor  actions  in 
business  applications)  as  exogenous  events.  Note  that  some  events  may  have  inhibiting  effects 
in  that  they  reduce  the  probability  of  achieving  certain  desired  effects; 

(iv)  Feasible  actions,  regarded  as  control  factors,  which  can  be  employed  by  an  organization  to 
influence  the  state  of  the  environment:  A  =  {Aq  1 1  <  q  <  N A } ,  where  N A  =  |A|  is  the  total  number 
of  feasible  actions.  Each  action  will  take  a  value  of  “true”  or  “false”  at  each  time  slice  once  the 
decision  maker  determines  a  strategy.  That  is,  Pk{A  =  1}  =  1  if  action  Aq  is  activated  at  time 

slice  tk ;  otherwise,  Pk{Aq  =  1}  =  0  .  If  there  are  no  constraints  on  actions,  potential  choices  for 
each  action  consist  of  2r  strings  of  binary  digits  ranging  from  ‘0(tj)A  0 (tT)  ’  to  ‘l(tj)A  1  (tT)  ’. 
In  real  applications,  however,  the  available  potential  actions  maybe  very  limited  and  much  less 
than  2t  .  Without  loss  of  generality,  we  assume  that  (r£/  + 1) «  2T  feasible  choices  for  action 

An  from  a  domain  f) ,  ={a°,a1  ,a2, a  a1 }  arc  available.  Each  element  a‘  (0  <  i  <  r  )  in  this  set 
maps  to  a  string  ‘ a'q(tx) A  a'q{tT)'  with  aq(tk)e  {0,l}(0<k<T) .  Fet  fa  be  the  cost  of 


selecting  string  a  for  action  A  .  A  strategy  under  a  given  initial  environment  state  X(t0)  is  a 
set  of  strings  for  all  the  actions:  S  =  {(al,a2, A  aN  )\a  s  ClA  ,1  <^<iVA}.  Thus,  the  space  of 

feasible  action  strategies  is  =  £2A  x  12,  xa  x  f 2  ,  and  the  cost  of  the  strategy  is  Fs  -  V /j  • 

9=1 

(v)  Intermediate  effects  are  defined  to  differentiate  those  effects  that  are  not  desired  end  effects  per 
se,  but  are  useful  in  connecting  the  actions  and  events  to  the  desired  effects.  These  effects  are 
termed  direct  effects  in  Effects-Based  Operations  [McCrabbOl].  All  the  intermediate  effects 
form  a  set  C  =  {Cm\\  <m<Nc}  with  Nc  =  |c| .  Fig.  2  shows  that  only  desired  effects  and 

intermediate  effects  are  connected  by  diachronic  edges; 

(vi)  Direct  influence  dependencies  between  all  the  objects  of  the  system  and  their  mechanisms  are 
specified  by  conditional  probability  tables  (CPTs)  in  Bayesian  networks  parlance.  We  assume 
that  actions  and  events  in  our  model  are  root  nodes  and  the  desired  effects  are  conditionally 
independent.  Consequently,  directed  synchronic  edges  E  =  {<vi,vj  >}  exist  only  from  an  action 

to  an  intermediate  effect  or  a  desired  effect  (y.  e  A,v.  e  CuD),  from  an  event  to  an 
intermediate  effect  or  a  desired  effect  (v.  e  B,v  .  e  C  uD),  and  from  an  intermediated  effect  to  a 
desired  effect  or  another  intermediate  effect  (y.  e  C,v .  e  CuD).  Diachronic  edges  are  directed 

from  immediately  prior  time  slice  to  the  current  one  for  each  intermediate  and  desired  effect. 
Evidently,  the  number  of  CPTs  needed  is  ( Nc+Nd )■ 

(vii)  The  total  budget  available  for  the  organization  is  constrained  by  Fbudget . 

It  can  be  seen  that  four  types  of  nodes  are  defined  in  our  effects-based  mission  model  such  that 
V  =  AuBuCuD  .  The  total  number  of  nodes  in  the  mission  model  is  N  =  NA+NB+NC+ND.  Define 

the  timeline  starting  from  initial  time  t0  to  the  terminal  time  t  j  for  achieving  all  the  desired  effects, 
where  tT=  max(tD  ),  (1  <n<ND).  Without  loss  of  generality,  the  initial  environment  state  x(t0)  is 

assumed  known  and  deterministic;  in  other  words,  all  the  effects  (desired  or  intermediate)  are  observed 
as  being  “true”  or  “false”.  Given  that  the  total  number  of  nodes  is  N  ,  the  number  of  possible  states  is 
2n  .  The  states  span  from  all  “false”  to  all  “true”  with  each  state  having  a  probability,  say  PM  ,  with  the 

2n 

constraint  that  ^  pM  =  i . 

M—l 

Conceptually,  the  problem  is  to  achieve  the  desired  effects  {D}  with  a  very  high  probability  at 
specified  times.  Thus,  we  only  need  to  focus  on  the  marginal  probabilities  of  all  the  D' s.  For  example, 
in  Fig.  2,  if  Dl(t„i)  =  land  D2 (tD  )  =  I  arc  desired,  the  objective  of  our  problem  is  to  make  the  initially 

P  f  t{})  =  0  to  be  a  “statistically  significant”  Pn(tT),  where  P  is  the  joint  probability  of  desired 
effect  D,  in  state  i  and  desired  effect  D2  in  state  /  .  Evidently,  Pm  (tk )  +  P0I  (tk )  +  Pin  (tk )  +  Pll(tk)  =  \  holds 
for  all  tk . 

The  mathematical  formulation  of  the  strategy  optimization  problem  is  as  follows: 


subject  to: 


max(P{D(tk )l  X(t0),S}) 

=  max(p{£>,  (tDi  )D2  (tr>i  )A  DN>  (tDfi )  I  X(t0),  5}) 


f  N 


D 


\ 


=  max 

5 

v 


'\\P{Dn(tDj\X(tn),S} 


7 


(assumption  (iii)) 


Fs=Lf. 


<  F 


budget 


«=i 


(1) 


(2) 


3.  Solution  Approach 

3.1  Overview  of  the  Solution  Approach 

As  shown  in  Fig. 3,  our  approach  to  solve  the  strategy  optimization  problem  combines  concepts  from 
robust  design,  dynamic  Bayesian  networks  and  heuristic  optimization  algorithms.  DBNs,  which  adopt 
probability  evaluation  algorithms  such  as  the  junction  tree  for  stochastic  inference  [Jordan99],  are  used 
to  model  the  dynamics  of  the  environment  and  to  calculate  the  probability  of  desired  effects  at  specified 
times.  Monte  Carlo  runs  are  made  to  account  for  uncertainty  in  system  parameters  in  the  inner  loop  of 
DBN.  That  is,  disturbances  are  introduced  by  randomly  choosing  network  parameters  (prior  pmfs  of 
events  and  conditional  probabilities).  In  each  Monte  Carlo  run,  DBN  will  evaluate  the  joint  probability 
of  achieving  the  desired  effects.  The  results  of  Monte  Carlo  runs  provide  a  histogram,  and  we 
approximate  it  as  a  Gaussian  density  (based  on  the  Central  Limit  Theorem)  with  sample  mean  and 
sample  variance.  Using  the  sample  mean  and  variance  and  following  robust  design  techniques  of 
Taguchi  [Phadke89],  a  signal-to-noise  ratio  (SNR)  is  computed;  this  criterion  maximizes  the  probability 
of  achieving  the  desired  effects  while  minimizing  its  variability.  A  genetic  algorithm  is  employed  in  the 
outer  loop  to  optimize  the  action  strategies. 


Monte 

Carlo 

Runs 


Fig.  3:  Approach  Overview 


Conceptually,  the  probability  of  achieving  the  desired  effects  is  a  function  of  actions  A  ,  exogenous 
events  B  and  time  tk ,  that  is,  P(  D)  =  /(A,  B,tk ) .  In  iterations  of  the  genetic  algorithm,  since  we  choose 
candidate  strategies,  thereby  fixing  the  values  of  A ,  the  probability  will  be  a  function  of  events  B  and 
time  tk ,  that  is,P(D\  A)  -  g{Bjk) .  Then,  in  each  Monte  Carlo  run  of  DBN  inference,  for  the  given 
sequences  of  actions  A ,  we  estimate  the  occurrence  probabilities  of  exogenous  events  B . 
Consequently,  from  a  single  Monte  Carlo  run,  we  have P(D  I  A,B)  =  h(tk).  We  can  see  that  Monte  Carlo 

runs  inside  the  DBN  inference  makes  it  possible  to  measure  the  robustness  of  a  strategy  in  an  uncertain 
environment  in  terms  of  the  signal-to-noise  ratio. 

3.2  Probability  Propagation  through  DBN 

Bayesian  networks  (BN),  also  known  as  probabilistic  networks,  causal  networks  or  belief  networks,  are 
formalism  for  representing  uncertainty  in  a  way  that  is  consistent  with  the  axioms  of  probability  theory 
[Pearl88].  As  a  Graphical  model  with  strong  mathematical  background,  it  has  grown  enormously  over 
the  last  two  decades.  Indeed,  there  is  now  a  fairly  large  set  of  theoretical  concepts  and  results 
[Jordan99],  as  well  as  software  tools  for  model  construction,  learning  and  analysis,  such  as  Microsoft’s 
MSBNX  [Microsoft],  Nettica  [Nettica]  and  Matlab  Toolbox  [Murphy]. 

Given  a  set  of  nodes  V  =  {v, ,  v2  ,A  vn },  a  Bayesian  network  computes  the  joint  probability  of 
variables  in  the  network  via: 

P{vx ,  v2  ,A  vN )  =  ]~I  P(vi  I  n  (vf ))  (3) 

i=i 

where  7l  (v, )  is  the  possible  instantiation  of  the  parent  nodes  of  v, .  This  equation  is  derived  based  on  the 

chain  rule  of  probability  and  conditional  independence  [Heckerman  95].  To  be  precise,  given  the  state 
of  a  node’s  parents,  all  the  ancestors  are  conditionally  independent  of  the  node.  Here,  we  use  “parents” 
to  depict  the  directly  fan-in  nodes,  and  “ancestors”  to  represent  the  parents’  parents,  and  so  on. 

A  major  drawback  of  the  standard  theory  of  Bayesian  networks  is  that  there  is  no  natural 
mechanism  for  representing  time  [Aliferis96] .  Dynamic  Bayesian  networks  (DBN)  are  normally  used 
for  representing  Bayesian  networks  that  also  take  into  account  temporal  information.  As  we  see  from 
Fig. 2,  DBN  is  a  compact,  factored  representation  of  a  Markov  process  [D’Ambrosio99].  Since  the  state 
of  the  environment  is  still  static  during  one  time  slice,  DBN  can  be  decomposed  as  a  sequence  of  static 
Bayesian  networks  with  certain  connections  [Barrientos98]. 

Based  on  Markov  hypothesis,  the  probability  of  state  at  time  slice  t ^  in  a  DBN,  given  all  the 
evidence  (in  our  case,  actions  and  events)  up  to  that  time  is  given  by  [Russell95]: 

P{X{tk)  I  { X {ti ) ; { A(q ), B(ti ) }f=1 )  =  P(X(tk)  I  X{tk_xy,A{tky,B(tk))  (4) 

In  our  model,  intermediate  and  desired  effects  between  adjacent  time  slices  have  temporal  links; 
other  nodes  are  supposed  to  be  temporally  independent  [TawfikOO].  The  temporal  independence  implies 
that  the  probability  mass  functions  of  node  v,  at  time  tk  and  that  of  v;  at  time  tq,  which  are  not 

temporally  connected,  are  independent,  that  is,  P]yi (tk )  I  v/  (tq  )}=  P{vt  (t,  )}=  Pk  (v, )  for  tk  A  tq . 

Fig.  4  shows  the  augmented  Bayesian  network  which  is  applied  for  probability  propagation  in  the 
effects-based  mission  model.  It  is  logically  extended  from  the  initial  static  Bayesian  Network  by 
introducing  dummy  nodes  for  all  the  intermediate  and  desired  effects.  Dummy  nodes  are  defined  as: 
V°  =  {V(°  I  v(  e  C  u  D)  with  Pk+l {vf }  =  Pk  {v,. } .  The  corresponding  CPTs  are  listed  in  Tables  I- III.  This  data 
will  be  used  in  a  later  example. 


Fig.  4:  Augmented  Bayesian  Network 


Table  I:  CPT  for  Q 


Parent  Nodes 

Ci 

A: 

a3 

Ci° 

Yes 

No 

Yes 

Yes 

1.0 

0.0 

Yes 

No 

0.4 

0.6 

No 

Yes 

1.0 

0.0 

No 

0.0 

1.0 

Yes 

Yes 

1.0 

0.0 

No 

No 

0.1 

0.9 

No 

Yes 

1.0 

0.0 

No 

0.0 

1.0 

Table  II:  CPT  for  D, 


Parent  Nodes 

D, 

Ai 

C, 

D,° 

Yes 

No 

Yes 

Yes 

Yes 

1.0 

0.0 

No 

0.4 

0.6 

No 

Yes 

1.0 

0.0 

No 

0.05 

0.95 

No 

Yes 

Yes 

1.0 

0.0 

No 

0.35 

0.65 

No 

Yes 

1.0 

0.0 

No 

0.0 

1.0 

Table  III:  CPT  for  P2 


Parent  Nodes 

D: 

B, 

Ci 

D,° 

Yes 

No 

Yes 

Yes 

Yes 

1.0 

0.0 

No 

0.1 

0.9 

No 

Yes 

1.0 

0.0 

No 

0.0 

1.0 

No 

Yes 

Yes 

1.0 

0.0 

No 

0.5 

0.5 

No 

Yes 

1.0 

0.0 

No 

0.0 

1.0 

In  the  DBN  of  Fig.  2,  the  probability  will  propagate  vertically  from  causal  nodes  to  effect  nodes, 
and  propagate  horizontally  from  one  time  slice  to  the  next  as  follows: 

(i)  Set  the  initial  pmfs  of  nodes:  P0{v,  }  =  P1  { v° } ,  v,  e  C  u  D  based  on  known  x (t0) ; 

(ii)  Let  k  =  1; 

(hi)  Select  an  action  strategy:  s  =  {(al,a2,A  uNa)  I  aq  e  klAq  ,1  <  q  <  NA] , 
where  if  aq (tk )  =  1,  set  Pk { Aq  =  1}  =  1 ;  else  Pk{Aq  =  1}  =  0 ; 

(iv)  Randomly  select  probability  mass  functions  of  exogenous  events  Pk  {Z? . },  1  <  j  <NB; 

(v)  Calculate  probability  mass  functions  of  the  intermediate  and  desired  effects  using  Bayesian 
model  averaging  [Madigan96] : 

pk {v, }  =  £ P{Vi  1 7C (v, ), v,° } •  Pk {k (v, ) }Pk {v,0}  ,  v,  eCu/); 

7t  (v,-),V? 

(vi)  Propagate  the  current  probability  mass  functions  to  the  next  time  slice: 

n+ik°}  =  n{v,  },  v,  eCuD; 

(vii)  Let  k  =  k+1.  If  k  <  T ,  go  back  to  step  (iii);  otherwise,  stop. 

3.3  Action  Strategy  Optimization  with  GA 

3.3.1  Algorithm  Overview 

Genetic  algorithms  (GAs)  are  general-purpose  global  optimization  techniques  based  on  the  principles  of 
evolution  observed  in  nature.  They  combine  selection,  crossover  and  mutation  operators  with  the  goal 
of  finding  the  best  solution  to  a  problem.  GA  creates  an  initial  population,  evaluates  the  fitness  of  each 
individual  in  this  population,  and  searches  for  a  near-optimal  solution  from  generation  to  generation 
until  a  specified  termination  criterion  is  met.  These  algorithms  have  been  widely  used  in  areas  where 
exhaustive  search  maybe  infeasible  because  of  a  large  search  space  and  where  domain  knowledge  is 
difficult  or  impossible  to  obtain. 


The  use  of  genetic  algorithms  requires  the  specification  of  six  fundamental  elements:  chromosome 
representation,  selection  function,  the  genetic  operators  making  up  the  reproduction  function,  the 
creation  of  initial  population,  termination  criteria  and  the  evaluation  function  [Houck95] . 

In  this  section,  we  use  GA  to  navigate  the  solution  space  to  obtain  a  near-optimal  action  strategy.  A 
typical  GA  may  have  a  genetic  cycle  as  follows  [Stender94]: 

(i)  Initialize  the  population  randomly  or  with  potentially  good  solutions; 

(ii)  Evaluate  the  fitness  for  each  individual  in  the  population; 

(iii)  Select  parents  for  alteration; 

(iv)  Create  offspring  by  crossover  and  mutation  operators; 

(v)  Reorganize  the  population  by  deleting  old  ones  and  creating  new  ones  while  keeping  the  total 
size  fixed; 

(vi)  Go  to  (iii)  until  termination  criteria  are  met. 

Our  implementation  of  GA  for  strategy  optimization  is  illustrated  in  Fig. 5.  Important  steps  and 
fundamental  issues  will  be  explained  in  more  detail  in  the  following  subsections. 


Optimized 

Strategy 


Fig.  5:  GA  Cycle  for  Strategy  Optimization 

3.3.2  Chromosome  Representation 

For  any  GA,  a  chromosome  representation  is  necessary  to  describe  each  individual  in  the  solution 
population.  The  population  in  our  problem  corresponds  to  candidate  strategies  to  induce  the  desired 
effects.  We  use  integer- valued  GA  in  our  problem.  In  section  2,  the  feasible  actions  are  given  by 
A  =  {Aq  1 1  <  q  <  N A }  with  A  e  {a°q ,  aq  ,A  a!‘‘ } .  Thus,  the  chromosome  can  be  represented  as  a  string  of 
integer  genes  co  =  (oyco,  a  co9  ) ,  where  0  <  co9  <rq.  The  lower  bound  “0”  corresponds  to  the  null  action 
“do  not  perform  A”  in  the  entire  timeline.  If  cof/  =1,  aq  is  picked  for  A  ,  if  co?  =2,  aq  is  picked  for 
A  ,  and  so  on.  In  other  words,  the  gene  is  coded  to  represent  the  assignment  of  an  action,  and  the  whole 
chromosome  is  a  code  representing  an  action  strategy. 

3.3.3  Initial  Population  and  Pre-filtrating 

Population  initialization  is  the  first  step  in  GA.  The  most  popular  method  is  to  randomly  initialize  the 
population.  However,  since  GAs  can  iteratively  improve  existing  solutions,  the  beginning  population 
can  be  seeded  with  potentially  good  solutions  [Houck95],  especially  for  cases  where  partial  knowledge 
about  the  solution  is  known.  In  our  problem,  we  generate  the  initial  strategy  randomly.  Thus,  for  any 
individual  co  =  (co j(0 2 A  co 9 )  in  the  initial  population,  G)f/ (1  <q<NA)  is  randomly  selected  from 

{0,1, A  ,r  } .  The  size  of  the  population  can  be  selected  to  conform  to  available  computational  resources 
(time  and  memory)  and  to  accommodate  the  size  of  the  solution  space. 


In  planning,  other  important  issues  such  as  the  cost  of  a  strategy  and  the  available  resources  need  to 
be  considered.  A  randomly  created  individual  co  =  too , oo 2  A  co?)  is  pre-filtered  to  satisfy  the  constraints  of 

N, 

cost  and  resource  budgets.  For  example,  verifying  for  each  individual  if  <  FhurJget  is  satisfied 

<7=1 

enables  us  to  check  the  cost  constraint  for  feasibility. 


3.3.4  Evaluation  Function 

DBN  performs  the  inner  loop  inference  to  compute  the  evaluation  function  for  GA.  The  evaluation 
function  will  map  the  population  candidate  into  a  partially  ordered  set  [Houck95],  which  will  be  input  to 
the  next  step,  i.e.,  population  selection. 

DBN  is  used  to  obtain  the  probability  of  achieving  the  desired  effects  at  certain  time  slices  for  a 
given  strategy  p{Dl(tD)D2(tDjA  DNo(tDfi  )l  X(i{)),s\.  In  a  noisy  environment,  this  probability  is  a 

random  variable  because  of  the  uncertainty  in  the  statistical  description  of  exogenous  events  B  .  In  the 
DBN  loop,  we  generate  a  histogram  of  this  probability  via  M  Monte  Carlo  runs,  the  sample  mean  and 
variance  are  computed  via: 

1  M  f  , 

^=^LPih(tD1)D2(tD2)A  DND(tDN)\X(t0),S\  (5a) 

M  i= i 

o2=Xrxi^D^(tr,  )*V'„2)A  DWD(^)IA(t0),5}-p)2  (5b) 

M  1  ,=1 


Signal-to-noise  ratio  (SNR)  provides  a  measure  of  goodness  or  fitness  of  a  strategy, 
computed  via  [Phadke89]: 


SNR  =  -101og10 


1 

f  <yi 

1  +  3  — 

y 

»  l 

(6) 


SNR  is 


This  SNR  corresponds  to  larger-the-better  type  robust  design  problem.  The  term_L 


b 


f  a 2  ^ 

1  +  3  — 

V  V- , 


is  an 


approximation  of  mean  square  reciprocal  quality  characteristic,  which  implies  maximization  of  p  ,  while 
minimizing  a 2 .  The  optimized  evaluation  function,  SNR,  corresponds  to  a  strategy  that  has  high 
probability  of  success,  and  that  is  also  robust  to  changes  in  the  environment  (unforeseen  events, 
uncertainty  in  parameters,  etc.). 


3.3.5  Selection  Function 

The  fitness  evaluation  provides  a  partially  ordered  set  of  candidate  strategies  from  the  best  to  the  worst. 
If  the  termination  criteria  are  not  met,  successive  generations  are  produced  from  individuals  selected 
from  a  partially  ordered  set.  There  are  several  schemes  for  the  selection  process:  roulette  wheel 
selection  and  its  extensions,  scaling  techniques,  tournament,  elitist  models,  and  ranking  methods 
[Houck95]. 

Holland’s  roulette  wheel  [Holland75]  is  the  first  and  maybe  the  most  popular  selection  scheme 
imitating  the  natural  selection.  However,  traditional  roulette  wheel  limits  the  evaluation  function  in  a 
way  that  it  must  map  the  solutions  to  a  fully  ordered  set  of  values  on  91+  .  Since  SNR  is  negative  in  our 
case,  we  use  the  normalized  geometric  ranking  method  [Joines94]  as  follows. 

When  population  is  {S.  1 1  <  /  <  NP } ,  the  probability  of  selecting  St  is  defined  as: 

P(select  Sj)= 

!-(!-<?)' 


(7) 


where  q  is  a  specified  probability  of  selecting  the  best  individual,  r  is  the  rank  of  the  individual  with  the 
best  individual  ranked  at  ‘1’.  The  best  individual  will  have  a  better  chance  of  being  selected  for 
reproducing  an  offspring  for  the  next  generation. 

3.3.6  Genetic  Operators 

Mutation  and  crossover  are  basic  operators  to  create  new  population  based  on  individuals  in  the  current 
generation.  Crossover  takes  two  individuals  and  produces  two  new  individuals,  while  mutation  alters 
one  individual  to  produce  a  single  new  solution  [Houck95].  Since  our  chromosome  is  a  string  of 
integers,  we  employ  the  following  genetic  operators  to  generate  individuals  for  the  new  strategy: 

Uniform  mutation-  '  l^(0’r?)  //  the  q,h  gene  of  the  chromosome  is  selected  for  mutation 

q  |co?  other-wise 

Integer-valued  simple  crossover  generates  a  random  number  /from  U(l,NA),  and  creates  two  new 
strategies  S-  and  S'j  through  interchange  of  genes  as  follows: 

,  K-  H  (i<l)  ,  K  if  (/  <  /) 

CD,  =1  CD  ,  =  \  (9) 

[co;.  else  1  [go,.  else 


3.3.7  Termination  Criteria 

GA  runs  from  one  generation  to  the  next,  evaluating,  selecting  and  reproducing  until  a  predefined 
termination  criterion  is  met.  Typically,  there  are  three  kinds  of  stopping  criteria: 

(i)  Define  a  maximum  number  of  generations  and  stop  at  a  predefined  generation. 

(ii)  Stop  when  the  population  converges.  That  is,  all  the  individuals  in  the  population  have 
approximately  same  fitness  function. 

(iii)  Stop  when  there  is  no  distinct  improvement  in  the  fittest  solution  over  a  specified  number  of 
generations. 

The  fittest  strategy  at  the  terminal  generation  corresponds  to  the  optimized  strategy. 

4.  Illustrative  Examples  and  Results 

4.1  Business  Scenario 

4.1.1  Example  Description 

Fig.l  is  a  simplified  partial  model  of  a  marketing  problem  faced  by  a  hypothetical  company.  Suppose 
the  company  wants  to  advertise  and  promote  sales  via  traditional  media  marketing,  as  well  as  Internet 
marketing  (online  promotions).  The  relevant  actions  are: 

A i  —Use  Sunday  newspaper  to  advertise  products  and  deliver  discount  coupons  to  potential 
customers; 

A  i  —Promote  the  URL  (Uniform  Resource  Locator)  of  the  company  in  the  Sunday  newspaper.  This 
action  is  also  known  as  integrated  marketing  that  promotes  online  clients  through  traditional 
media; 

As—  Advertise  on  Company’s  Website  and  make  coupons  available  for  download  and  print  out. 

The  marketing  and  sales  divisions  of  the  company  must  choose  the  best  strategy  to  achieve  a  direct 
goal  of  redeeming  the  coupons,  as  well  as  having  the  majority  of  customers  return  to  company’s  URL. 
We  define  the  desired  effect  Di  as  “Promotional  coupons  are  redeemed”  and  Di  as  “Customers  revisit 
URL”.  However,  aside  from  the  actions  taken  by  the  company,  other  events  such  as  specific  customers’ 
preferences  may  also  significantly  affect  the  desired  effects.  Define  “Customers  dislike  the  promoted 
products”  as  exogenous  event  B/-  The  redeemed  coupons  may  either  be  the  coupons  published  in 


newspapers  or  coupons  downloaded  and  printed  from  the  website.  An  intermediate  effect  Ej  is  used  to 
depict  “Coupons  are  downloaded  from  website”.  In  conclusion,  we  have  three  potential  actions  Ah  A2 
and  A3 ;  two  desired  effects  Dj  and  ZL,  an  intermediate  effect  C,  that  transfers  the  influences  from  A2  and 
A3  to  D,;  one  exogenous  event  Bh  which  has  certain  influence  on  D2. 

4.1.2  Experiment  Results 

Suppose  the  initial  effects  (desired  or  intermediate)  are  all  zeros  and  we  desire  Dl  (7)  =  1  and  d2  (7)  =  1  - 
Devise  ={a°,al) ,  Q. M  =  [a°2  ,a\ }  and  D.Ai  =  {a3  ,a\,a3  ,a33,a3  ,as3,a3,al] ,  where  a.  (/ =  1,2,3)  implies  no 
action,  a\  corresponds  to  the  action  to  advertise  coupons  in  the  Sunday  newspaper,  and  a2  is  the  action  to 
advertise  URL  in  the  Sunday  newspaper.  Note  that  these  two  advertising  actions  are  valid  for  the  entire 
week.  For  the  third  action,  a]  corresponds  to  online  coupons  being  available  at  the  same  time  as  the  URL 
promotion;  a ®  has  one  day  delay,  a2  has  two  days  of  delay,  and  so  on.  All  the  actions  are  listed  in  Table 
IV.  Event  B]  (customers  dislike  the  promoted  products)  is  supposed  to  occur  with  a  probability  that  is 
uniform  between  [0.2, 0.6]. 


Table  IV:  Potential  Actions  for  the  Marketing  Problem 


Action 

Ai 

a2 

A 

3 

ai 

a\ 

«2° 

Cl2 

a3 

a\ 

a\ 

a 3 

4 

a3 

a\ 

6 

a3 

7 

a3 

tl 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

1 

t2 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

1 

1 

t3 

0 

1 

0 

1 

0 

0 

0 

0 

0 

1 

1 

1 

u 

0 

1 

0 

1 

0 

0 

0 

0 

1 

1 

1 

1 

h 

0 

1 

0 

1 

0 

0 

0 

1 

1 

1 

1 

1 

h 

0 

1 

0 

1 

0 

0 

1 

1 

1 

1 

1 

1 

t7 

0 

1 

0 

1 

0 

1 

1 

1 

1 

1 

1 

1 

Consider  three  strategies:  Sj  ={a\,a\, u34) ,  S2  =  (a° ,a\,a]) ,  S3  =  {a\,a\,a]) .  Figs.  6  (a-c)  show  results 
of  single  runs  of  the  DBN  with  a  fixed  prior  probability  for  event  B,\  P(Bl  =  1)  =  0.4 .  Evidently,  the 
probability  of  desired  effects  D,  and  D2,  as  well  as  the  intermediate  effect  Eh  are  functions  of  time. 
Since  the  occurrence  of  the  exogenous  event  B ,  is  random,  we  generated  100  Monte  Carlo  runs  for  these 
three  strategies  and  computed  the  joint  probability  of  desired  effects  P{  Dl  (7)  =  1,  D2  (7)  =  1 1 S} .  Fig.  6(d) 
shows  that  the  variance  of  the  joint  probability  may  also  change  with  time  and  clearly  the  strategy  s3  is 
the  best. 

Indeed  S3  is  the  optimal  strategy.  The  results  in  Fig.  6(e)  were  obtained  from  GA  of  20  generations, 
with  each  generation  having  a  population  of  size  10.  The  sample  mean  and  variance  of  the  joint 
probability  of  desired  effects  for  each  individual  are  obtained  from  1000  Monte  Carlo  runs.  SNR,  as 
defined  earlier,  serves  as  the  fitness  measurement.  Under  randomly  generated  initial  populations  in  Fig. 
6(f),  Fig.  6(e)  shows  that  the  best  strategy  is  Sopl  =  (a\,a\,a] )  =  .S,  and  the  GA  converges  in  less  than  10 

generations. 

The  following  conclusions  can  be  made  from  the  results  in  Fig. 6: 

(i)  Since  S3  is  substantially  better  than  Sh  the  time  to  put  coupons  on  website  (A  ,)  cannot  lag  too 
much  after  the  advertisement  in  the  newspaper  (A2). 

(ii)  Since  S2  and  S3  have  similar  performance,  the  benefit  of  a  traditional  marketing  is  very  limited. 
Consequently,  action  A,  maybe  removed  from  the  action  set. 


Fitness  (SNR)  intermediate  and  desired  effects  intermediate  and  desired  effects 


(a)  Single  Run  of  Sj 


(c)  Single  Run  of  S3 


(b)  Single  Run  of  S2 


(d)  100  Monte  Carlo  Runs  of  Si,  S2  and  S3 
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(e)  Strategy  Optimization  through  GA 


(f)  Initial  Population 


Fig. 6  Simulation  Results 


4.1.3  Statistical  Analysis 

Fig.7  shows  a  histogram,  obtained  from  1000  Monte  Carlo  runs,  of  the  probability  of  achieving  desired 
effects  given  strategy  S3.  It  can  be  seen  that  the  histogram  is  nearly  Gaussian  which  is  consistent  with 
the  Central  Limit  Theorem  (CLT).  In  Fig.  8,  we  plot  P00 ,  P01,  P10,  P11  which  represent  the  sample  means 
of  P(D]  (7)  =  0,  D2  (7)  =  0 1  Sopt ) ,  P(Di  (7)  =  0,  D2  (7)  =  1 1  Sopt) ,  P(D1  (7)  =  1,  D2  (7)  =  0  I  Sopt )  and 

PCD,  (7)  =  1,D2(7)  =  1 1  S  ,) ,  respectively.  We  can  see  that  Pn  is  significantly  higher  than  others.  Based 

on  the  Gaussian  approximation,  the  following  statistical  analysis  can  be  performed  on  the  obtained 
results: 


Fig.7  Histogram  of  1000  Monte  Carlo  Runs  Fig. 8  Pdf  of  Sample  Mean 

(i)  Two-sided  Confidence  Region 

If  the  sample  size  is  sufficiently  large,  the  two-sided  confidence  region  for  the  probability  of  reaching 
desired  effects  can  be  calculated  from  the  sample  mean  p  and  sample  variance  o  as: 

(p  -a  Za/2,p  +  a  Za/2)  (10) 

Here  Za/2  denotes  the  (l-a)%  two-sided  probability  region  for  a  N(0,1)  random  variable.  With  the 
sample  mean  p=  0.837  and  standard  deviation  a  =0.0141,  the  95%  confidence  region  is 
[0.8095,0.8646] .  Thus,  given  the  prior  pmfs  for  the  exogenous  events  and  conditional  probability  tables 
of  the  Bayesian  network,  we  can  be  quite  confident  that  the  probability  of  achieving  the  desired  effects 
is  in  the  range  [0.8095,0.8646],  as  illustrated  in  Fig.  9.  A  narrower  confidence  region  means  better 
control  of  the  environment. 

Fig.  10  shows  the  propagation  of  the  mean  probability  of  achieving  the  desired  effect  D2  and  95% 
confidence  intervals  under  strategies  S2  =  (a° ,  a\ ,  a] )  and  S4  =  (a,1 ,  a\ ,  a® ) .  We  can  conclude  from  Fig. 

10  that  different  strategies  may  have  very  different  trajectories  and  that  the  confidence  regions  may  also 
change  with  time. 

In  some  cases,  the  confidence  regions  may  overlap  with  each  other  for  two  strategies.  In  this  case, 
we  cannot  simply  declare  one  strategy  to  be  superior  to  another  one.  Cost  of  the  strategy  can  be 
included  as  a  secondary  criterion,  that  is,  a  strategy  with  less  cost  will  be  preferable  to  one  with  a  higher 
cost,  even  though  both  may  be  within  the  cost  budget. 

(ii)  One-sided  Confidence  Region 


Two-sided  confidence  region  depicts  the  precision  of  the  predicted  probability.  Since  our  purpose  is  to 
maximize  the  probability  of  achieving  the  desired  effects,  another  parameter  of  interest  is  a  lower  bound 
on  (i .  This  results  in  one-sided  probability  region  as  (p,,l) ,  where  p.  =  p.  -o  Za  .  For  the  above  Monte 

Carlo  runs,  p  =  0.8139  for  a  95%  confidence  level.  The  lower  bound  tells  us  that  the  probability  of 
achieving  the  desired  effects  will  be  no  less  than  0.8139  with  a  95%  confidence. 


Fig.9  95%  Confidence  Region  Fig.  10  Trajectory  of  Desired  Effect  D,  (tk )  =  1 


(iii)  Hypothesis  Testing 

Suppose  the  probability  of  achieving  the  desired  effects  is  required  to  be  at  least  p0 .  Then,  the 

question  is:  can  we  accept  the  results  from  Monte  Carlo  Runs?  The  following  binary  hypothesis¬ 
testing  formulation  answers  this  question: 

f  H0:  p<p0 


(ll) 


Hx  :  p  >  p0 


For  a  specified  tail  probability  a  ,  if  — — —  exceeds  a  threshold  Za  ,  we  will  reject  Ho  and  accept 

a 

that  the  true  value  will  be  higher  than  p0 .  Thus,  the  strategy  is  acceptable.  Otherwise,  accept  H0 , 

that  is,  the  best  strategy  does  not  meet  our  expectation.  If  the  model  is  credible,  the  latter  result 
implies  that  the  desired  effects  are  beyond  the  capability  of  available  actions. 


4.2  Military  Scenario 
4.2.1  Example  Description 

Friendly  forces  are  assigned  to  capture  a  seaport.  There  is  a  suitable  landing  beach  with  a  road  leading 
to  the  seaport.  An  approximate  concentration  of  the  hostile  forces  is  known  from  intelligent  sources. 
In  addition,  friendly  intelligence  reports  that  the  enemy  is  using  tanks  to  prevent  the  infantry 
advancement  along  the  roads.  The  mission  objective  is  to  capture  the  seaport,  while  minimizing  the 
friendly  losses  due  to  attrition.  Drawing  upon  intelligence-generated  knowledge,  the  commander 
identifies  the  following  tactical  and  operational  centers  of  gravity  (COG)  that  may  need  to  be  attacked 
or  defended,  as  well  as  other  objects  of  interest  whose  state  will  affect  the  dynamics  of  the  battlespace 
and  the  mission  outcome:  hostile  (enemy)  air;  hostile  patrol-boats;  hostile  tanks;  neutral  air;  neutral 
patrol-boats;  neutral  tanks;  landing  beach  and  the  seaport.  This  fictitious  scenario  is  shown  in  Fig.  11. 
Suppose  that  the  initial  environment  state  at  to  corresponds  to  no  frinedly  losses  and  non-capture  of  the 


seaport,  and  that  the  timeline  to  execute  the  mission  is  divided  into  five  time  slices  (t,  through  t5).  We 
will  measure  the  mission  performance  (in  term  of  the  joint  probability  of  achieving  desired  effects)  at 
time  t5. 

The  hostile  forces  are  modeled  as  exogenous  events,  where:  B,  —  hostile  patrol-boats;  B2  — 
hostile  air;  B,  —  hostile  tanks.  Each  event  has  an  approximate  probability,  based  on  intelligence 
information  on  the  strength  of  the  hostile  forces.  However,  the  enemy’s  decision  as  to  the  time  at 
which  the  enemy  uses  its  forces  is  unpredictable. 

In  the  same  vein,  the  feasible  actions  of  friendly  forces  are:  Ay  —  neutralize  hostile  patrol-boats; 
A 2  —  neutralize  hostile  air;  A;  —  neutralize  hostile  tanks;  A.,  —  advance  to  seaport.  The  feasible 
actions,  along  with  potential  times  of  their  application,  are  listed  in  Table  V.  It  is  also  specified  that 
action  A4  cannot  be  taken  earlier  than  t4  due  to  certain  constraints.  Since  hostile  tanks  can  only  be 
encountered  when  friendly  forces  advance  to  the  seaport,  the  possible  times  to  take  action  A3  is  t3  or  t4. 

Desired  effects  are  defind  as:  D,  —  capture  the  seaport;  D2  —  keep  friendly  losses  to  a  minimum. 

The  following  intermediate  effects  are  designed  to  connect  actions  or  events  to  the  desired  effects: 
Ci  —  threat  from  hostile  patrol-boats;  C2  —  threat  from  hostile  air;  C3  —  threat  from  hostile  tanks;  C4 
—  friendly  losses  in  landing  on  the  beach. 

The  nodes  are  interconnected  as  a  Bayesian  network.  Fig.  11  is  the  corresponding  augmented 
network  by  introducing  dummy  nodes  for  intermediate  and  desired  effects.  The  CPTs  are  listed  in 
Tables  VI  through  XI. 


Table  V:  Potential  Actions 
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Ai 

A 

2 

Ay 

A4 

a\ 

a\ 

2 

ax 

3 

ax 

4 

ax 

«2 

Cl2 

2 

a2 

3 

a2 

4 

a2 

Cl2 

a\ 

a\ 

a4 

a\ 

2 

a4 

ti 

0 

l 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 
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0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

t3 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 
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0 

0 

0 

0 

0 

0 
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0 

0 

0 
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0 
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0 
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0 

1 
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1 

1 

Table  VI:  CPT  for  Q 


Parent  Nodes 

c. 

c,° 

B, 

Ai 

Yes 

No 

Yes 

Yes 

Yes 

0.7 

0.3 

No 

1.0 

0.0 

No 

Yes 

0.7 

0.3 

No 

1.0 

0.0 

No 

Yes 

Ta 

Yes 

Die  IX 

0.1 

CPT  I 

0.9 

Dr  C4 

Parent  Nodes 

c4 

c4° 

C2 

C, 

Yesl 

Nol 

Yes 

Yes 

1.0 

0.0 

Yes 

No 

0.8 

0.2 

No 

Yes 

0.7 

0.3 

No 

0.2 

0.8 

Yes 

Yes 

0.9 

0.1 

No 

No 

0.6 

0.4 

No 

Yes 

0.5 

0.5 

No 

0.0 

1.0 

No 

0.7 

0.3 

No 

Yes 

0.0 

1.0 

No 

0.0 

1.0 

Table  VII:  CPT  for  C2 


Parent  Nodes 

C2 

c2° 

B2 

A: 
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No 

Yes 
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0.6 

0.4 

No 

1.0 

0.0 

No 
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0.6 

0.4 

No 
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0.0 

No 
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0.1 

0.9 

No 

0.9 

0.1 

<D 

H 
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CP 
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1.0 

1 
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D, 
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C3 

A, 
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No 
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0.7 
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No 
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No 
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No 
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No 
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1.0 

No 
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0.9 

0.1 

No 

0.0 

1.0 

No  0.0  1.0 


Table  VIII:  CPT  for  C3 


Parent  Nodes 

C3 

c3° 

b3 

a3 
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No 
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0.3 

0.7 

No 
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No 
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No 
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No 
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No 
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No 
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No 
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Table  XI:  CPT  for  P2 


Parent  Nodes 

d2 

d2° 

C4 

c3 
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No 
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No 
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No 
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No 
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No 
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No 
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No 
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0.2 
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No 
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1.0 

4.2.2  Simulation  Results 

Since  the  events  may  happen  at  arbitrary  times,  the  problem  is  changed  from  one  of  searching  for  an 
optimal  strategy  to  that  of  finding  a  set  of  decision  rules,  that  is,  given  a  possible  combination  of  events, 
which  strategy  will  maximize  the  probability  of  achieving  the  desired  effects.  Consider  two  cases:  (i) 
Friendly  forces  encounter  both  threats  from  hostile  air  and  hostile  patrol-boats  at  time  tL  Whenever 
friendly  forces  advance  to  the  seaport,  the  hostile  tanks  will  defend  immediately;  (ii)  Friendly  forces 
encounter  hostile  air  at  time  //  and  encounter  hostile  patrol-boats  at  time  tf,  the  hostile  tanks  will  act  as 
in  case  (i).  The  results  from  these  two  cases  under  strategy  5,  =  are  illustrated  in  Fig. 12 

(a)  and  Fig.  12  (b),  respectively.  In  this  scenario,  we  assumed  that  the  intelligence  sources  are  reliable, 
and  the  action  probabilities  of  hostile  forces  are:  P{Bt  =  1}  =  0.8,  P{Bn  =  1}  =  0.7,  P{Bi  =  1}  =  0.8 . 


(a)  S i  for  case  (i) 


(b)  S i  for  case  (ii) 


(c)  S2  for  case  (ii)  (d)  S-i  for  case  (ii) 

Fig. 12  Simulation  Results 

Since  hostile  air  and  hostile  patrol-boats  are  separately  encountered  in  case  (ii),  the  landing  beach 
will  be  under  a  moderate  threat.  On  the  other  hand,  in  case  (i),  the  combination  of  two  events  may  put 
the  friendly  forces  in  the  landing  beach  under  severe  threat  due  to  the  infeasibility  of  simultaneously 
dealing  with  both  threats.  Thus,  the  friendly  losses  will  be  higher  in  case  (i). 

Now,  we  focus  on  case  (ii)  to  see  which  action  strategy  will  be  better.  Comparing  S,  with 
S2  =  (a\,a\,a\,a\)  and  S3  =  {a\,a\,a23,a\) ,  we  can  see  from  Figs.  12  (b-d)  that  S2  is  the  best  among  these 
three  strategies  because  all  the  hostile  forces  are  immediately  neutralized.  As  a  consequence,  the 
friendly  losses  due  to  attrition  are  low.  The  solid  lines  in  Figs.  12  (c-d)  depict  the  joint  probability  of 
achieving  both  of  the  desired  effects:  P{D1(tk)  =  l,D2(tk)  =  0} .  Fig.  13  is  the  result  from  genetic 

algorithm,  where  we  use  /’{£),  (5)  =  1,D,(5)  =  0}as  a  fitness  measurement.  Indeed,  S2  is  the  optimal 
solution  from  GA. 

Additionally,  we  consider  a  scenario  where  the  data  from  intelligence  sources  is  noisy.  We  model 
this  by  assuming  that  the  concentrations  of  the  hostile  forces  are  random.  We  suppose 
PjBjlq)  =  1}  =  PvP{B2(t2)  =  1}  =  P2,P{B3(t4)  =  1}  =  P3 ,  where  P,  is  uniformly  distributed  between  [0.6,  1], 
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P 2  is  uniformly  distributed  between  [0.5,  0.9]  and  P<  uniformly  distributed  between  [0.7,  0.9].  Results  of 
P{D>( 5)  =  1,D2(5)  =  01  S'.,} from  1000  Monte  Carlo  runs  are  shown  in  the  histograms  of  Fig.  14,  with  the 
Gaussian  distribution  superimposed.  The  sample  mean  and  standard  deviation  are  0.8641  and  0.0089, 
respectively.  The  two-sided  95%  confidence  region  of  this  strategy  is  (0.8467,  0.8816). 


Fig.  13  Strategy  Optimization  through  GA 


Fig.  14  1000  Monte  Carlo  Runs  for  S2 


5.  Conclusions  and  Future  Work 

This  paper  introduced  a  general  methodology,  based  on  an  integration  of  dynamic  Bayesian  networks 
and  the  genetic  algorithms,  to  optimize  strategies  for  offline  decision  support.  DBN  is  used  for 
evaluating  the  probability  of  achieving  desired  effects  for  a  given  strategy,  while  GA  is  applied  to  search 
for  the  optimum  solution  in  a  relatively  large  solution  space.  Since  uncertainty  is  unavoidable  in 
military  as  well  as  business  applications,  the  desired  effects  indeed  are  random  processes.  As  a 
consequence,  Monte  Carlo  runs  and  probabilistic  analysis  are  employed  to  determine  an  action  strategy 
that  trades  off  goodness  and  robustness.  The  main  contributions  of  this  paper  are:  the  use  of  DBN  to 
compute  time-dependent  probability  propagation  for  desired  effects;  use  of  GA  to  optimize  action 
strategies;  introduction  of  signal-to  noise  ratio  (SNR)  as  a  measure  of  robustness  of  a  strategy  in  an 
uncertain  environment. 

The  methodology  can  be  extended  to  more  realistic  scenarios.  In  our  examples,  we  assumed  that 
CPTs  are  known  and  time-invariant.  When  CPTs  are  elicited  from  many  experts,  they  may  not  always 
be  consistent  with  each  other.  In  this  case,  we  randomize  CPTs  in  Monte  Carlo  runs.  If  CPTs  are  time- 
varying,  the  only  change  needed  is  to  update  the  CPTs  with  time. 


Fig.  15:  Alternative  Optimization  Approach 


Both  GAs  and  DBN  are  computationally  expensive.  Consequently,  this  method  can  be  applied 
offline  in  the  planning  phase.  If  computational  time  is  of  concern,  an  alternative  optimization  approach 
is  shown  in  Fig.  15.  Suppose  the  expected  values  of  the  uncertain  prior  probability  of  events  and  CPTs 
are  known.  Then,  we  will  avoid  the  Monte  Carlo  runs  in  the  GA  loops.  This  may  be  advisable  because 
most  of  the  strategies  in  the  solution  space  tend  to  be  inferior.  Once  a  strategy  is  selected,  further  Monte 
Carlo  analysis  may  be  conducted  on  the  optimized  strategy  only. 

In  some  applications,  different  desired  effects  may  have  different  priorities.  Using  the  Gaussian 
approximation,  suppose  the  probabilities  of  achieving  desired  effects D{,D2, a  Dn  are  conditionally 


independent  random  variables  (nodes  without  direct  connections  in  a  Bayesian  network).  Define  mean 
vector  p  =(pj,p,2,A  , p  V) )  as  the  expected  values  for  the  probabilities  of  achieving  desired  effects  and 

let  the  corresponding  covariance  matrix  R  =diag[o12,a22,A  ,c2Nn  J.  In  this  case,  we  can  use  a  weighted 


SNR  to  measure  the  goodness  of  a  strategy  as  in  the  following  equation: 


03. 


SNR  =  - 10  log  10  j  J!  2" 

;=i  F, 


1  +  3—^ 

V  ^  J 


E«(=i 


(12) 


In  addition  to  its  application  in  offline  strategy  planning,  the  methodology  introduced  in  this  paper 
may  be  used  for  strategy  execution  phase.  The  strategy  obtained  by  the  offline  planning  is  open-loop  in 
that  it  is  an  action  sequence  based  on  the  current  forecast  of  future  events  [Bertsekas95].  However,  in 
the  strategy  execution  phase,  the  strategy  can  be  made  open-loop  feedback  optimal  based  on  observed 
events  and  intermediated  effects.  The  process  works  as  follows: 

(i)  Prune  the  nodes  which  have  no  relevance  for  future  effects  given  current  observations; 

(ii)  Adjust  the  model  parameters  to  conform  with  the  current  environment; 

(iii)  Optimize  the  strategy  using  the  methodology  of  the  paper. 


Acknowledgements 

This  work  was  supported  by  the  Office  of  Naval  Research  under  contract  #  N0001 4-00- 1-0101.  The 
authors  would  like  to  thank  Kevin  Murphy  from  UC  Berkeley  and  Michael  G.  Kay  et  al  from  the  North 
Carolina  State  University  for  providing  Matlab-based  Bayesian  network  toolbox  and  Genetic 
Algorithms,  respectively. 

Reference: 

[Aliferis96]  Aliferis,  C.F.  and  Cooper  G.F.  A  Structurally  and  Temporally  Extended  Bayesian  Belief 
Network  Model:  Eefinitions,  Properties  and  Modeling  Techniques.  Proceedings  of  the  12th  Conference 
on  Uncertainty  in  Artificial  Intelligence.  UAI-96.  Portland,  Oregon:  Morgan  Kaufmann. 

[Barrientos98]  Barrientos,  M.A.  and  Vargas  J.E..  A  Framework  for  the  Analysis  of  Dynamic  Processes 
Based  on  Bayesian  Networks  and  Case-based  Reasoning.  Expert  Systems  with  Applications.  15,  1998. 
[Bertsekas95]  Bertsekas,  D.P..  Dynamic  Programming  and  Optimal  Control.  Athena  Scientific  1995. 
[Boutilier98]  Boutilier,  C.,  Dean,  T.  and  Hanks  S..  Decision  Theoretic  Planning:  Structured  Assumptions 
and  Computational  Leverage.  Sept.  15,  1998. 

[DavisOl]  Davis,  P.K..  Effects-based  Operations. 
http://www.rand.org/contact/personal/pdavis/davis.online.html 

[D’Ambrosio99]  D’Ambrosio,  B..  Inference  in  Bayesian  Networks.  AI  Magazine.  Vol.20,  No. 2,  1999. 


[Emery65]  Emery,  F.E.  and  Trist,  E.L..  The  Causal  Texture  of  Organizational  Environments .  Human 
Relations.  18,  1965. 

[Heckerman95]  Heckerman,  D..  Causal  Independence  for  Probability  Assessment  and  Inference  Using 
Bayesian  Networks.  IEEE  Transactions  on  Systems,  Man  &  Cybernetics,  Part  A  (Systems  &  Humans), 
Vol.  26,  No.  6. 

[Holland75]  Holland  J..  Adaptation  in  Natural  and  Artificial  Systems.  The  University  of  Michigan  Press, 
Ann  Arbor.  1975. 

[Houck95]  Houck,  C.,  Joines  J.,  et  al.  A  Genetic  Algorithm  for  Function  Optimization:  A  Matlab 
Implementation.  NCSU-IE  TR  95-09,  1995. 

[Joines94]  Joines  J.  and  Houck  C.  On  the  Use  of  Non- Stationary  Penalty  Functions  to  Solve  Constrained 
Optimization  Problems  with  Genetic  Algorithms.  In  1994  IEEE  International  Symposium  Evolutionary 
Computation.  Orlando. 

[Jordan99]  Jorden,  M.I..  Learning  in  Graphical  Models.  MIT  Press  1999. 

[Kanazawa95]  Kanazawa,  K.,  Roller,  D.  and  Russell  S..  Stochastic  Simulation  Algorithms  for  Dynamic 
Probabilistic  Networks.  Proc.  of  the  11th  Annual  Conference  on  Uncertainty  and  Artificial  Intelligence. 
1995. 

[Leblebici8 1  ]  Leblebici  H.  and  Salancik  G.R.:  Effects  of  Environmental  Uncertainty  on  Information  and 
Decision  Processes  in  Banks.  Administrative  Science  Quarterly.  26,  1981. 

[Madigan96]  Madigan,  D.,  Raftery  A.E.,  et  al.  Bayesian  Model  Averaging.  Proc.  AAAI  Workshop  on 
Integrating  Multiple  Learned  Models.  Portland.  OR.  1996. 

[McCrabbOl]  McCrabb,  M..  Uncertainty,  Expeditionary  Air  Force  and  Effects-Based  Operations: 
Concept  of  Operations  for  Effects-based  Operations. 

[Microsoft]  http://research.microsoft.com/adapt/MSBNx/ 

[Murphy]  Murphy,  K.  http://www.cs.berkeley.edu/~murphyk/Bayes/bnt.html 
[Nettica]  http://www.norsys.com 

[Pearl88]  Pearl,  J..  Probabilistic  Reasoning  in  Intelligent  Systems:  Networks  of  Plausible  Inference, 
Morgan  Kaufmann,  San  Mateo,  CA,  1988. 

[Phadke89]  Phadke,  M.S..  Quality  Engineering  Using  Robust  Design.  Prentice  Hall  1989. 

[Russell95]  Russell,  S.  and  Norvig,  P..  Artificial  Intelligence,  A  Modern  Approach.  Prentice  Hall.  1995. 
[Stender94]  Stender,  J.,  Hillerbrand,  E.  and  Kingdon,  J..  Genetic  Algorithms  in  Optimization,  Simulation 
and  Modeling.  IOS  Press.  1994. 

[TawfikOO]  Tawfik,  A.Y.  and  Neufeld  E..  Temporal  Reasoning  and  Bayesian  Networks.  Computational 
Intelligence.  Vol.  16,  No.  3,  August  2000. 


